library(knitr)
library(dplyr)
library(ggbiplot)
library(plotly)

Visualising the political preferences

Questionaire

Brace yourself! Polish Parliamentary Election is coming October 13!

In Poland, as I guess everywhere, politics are promising a lot and evade direct answers. Thus I really like the tools like http://latarnikwyborczy.pl, widely used by polish voters before the elections (unfortunately there is no English version). It allows the citizens to compare their views on most discussed issues with the answers given by political parties - so gives you the clear overview of politics opinions.

This year questionnarie consist of 20 questions. They’re not divided into groups. There are basically three answers to each question: “I agree”, “I have no opinion”, “I disagree”. The final result is percentage coverage of your’s and political parties’ answers. Currently we have five of them. Of course this not determine your vote, as there might be other reasons to back or do not support some groups, but this gives a nice general view.

As my job and hobby is to visualise data I feel that there might be a better way to present results then single percent. Let me show you in this short report two options that come to my mind.

Parties and questions

I guess it is high time to introduce our players! (in brackets the short names - made from polish names - used in plots legends; alphabetic order by short names)

  • Civic Coalition: former ruling party Civic Platform + Modern (Nowoczesna), current President of the European Council was it’s leader before nomination (KO);
  • Confederation Freedom and Independence: righ-wing and eurosceptic alliance, according to the polls they’re balancing on election 5% threshold (Konfederacja);
  • The Left: left-wing alliance formed by parties SLD + Wiosna + Razem (Lewica);
  • Polish Coalition: Polish People’s Party (traditional rural party) + former rock musician Paweł Kukiz - interesting mix (PSL);
  • Law and Justice: current ruling party, polls give it most chances to win again (PiS);

And the list of questions in order of appearance:

QUESTIONS <- list(
  question01 = "Q1: The scope of competence of local self-governments should be gradually expanded.",
  question02 = "Q2: The headquarters of some central offices should be moved outside of Warsaw to other cities throughout the country.",
  question03 = "Q3: The independence of the judiciary from parliament and government should be strengthened.",
  question04 = "Q4: Retirement age should be increased.",
  question05 = "Q5: The income tax-free amount should be increased significantly.",
  question06 = "Q6: A patient in a public healthcare system facility should be able to pay for a higher standard of medical services.",
  question07 = "Q7: The scope of theoretical knowledge taught in schools should be limited in favor for the development of students' skills.",
  question08 = "Q8: The priority of the state's cultural policy should be to strengthen national identity.",
  question09 = "Q9: Public funds for learning should be focused on supporting the best universities in the country.",
  question10 = "Q10: Information media should remain under the Polish financial control.",
  question11 = "Q11: Christian values should be the basis of the state's social policy.",
  question12 = "Q12: The abortion law should be relaxed.",
  question13 = "Q13: Same-sex people should be able to enter into a legal partnership.",
  question14 = "Q14: Coal should remain the primary source of energy in Poland.",
  question15 = "Q15: Poland should accelerate the construction of the nuclear power plant.",
  question16 = "Q16: The development and strengthening of Territorial Defense Forces should be continued.",
  question17 = "Q17: Poland should accept a larger number of economic immigrants from other countries.",
  question18 = "Q18: The European Union should have less influence on Polish internal policy.",
  question19 = "Q19: Poland should support the deepening of European integration in the field of foreign and defense policy.",
  question20 = "Q20: Poland should strive to maintain international sanctions imposed on Russia after the aggression on Ukraine."
)

I think you can more or less guess most of the answers given by each party. I wrote down all the answers in simple csv file (manually, no fancy web-scraping this time) so now we can load them easily. To make my life easier in further computational methods I’ve coded the answers as follows: 1 - I agree 0 - I have no opinion -1 - I disagree

ANSWERS <- read.csv2("./app/data/answers.csv")
Party question01 question02 question03 question04 question05 question06 question07 question08 question09 question10
Lewica 1 1 1 -1 1 -1 1 -1 -1 -1
PSL 1 1 1 -1 1 1 1 1 -1 -1
Konfederacja 1 1 1 0 1 1 1 1 1 0
KO 1 1 1 -1 1 1 1 -1 -1 -1
PiS -1 1 -1 -1 0 -1 1 1 -1 1
Party question11 question12 question13 question14 question15 question16 question17 question18 question19 question20
Lewica -1 1 1 -1 0 -1 1 -1 1 1
PSL 1 -1 -1 -1 -1 -1 -1 1 1 1
Konfederacja 1 -1 -1 1 1 1 -1 1 -1 1
KO -1 -1 1 -1 -1 -1 1 0 1 1
PiS 1 -1 -1 1 1 1 -1 -1 1 1

Extracting principal components

Observing political views in 20 dimensional space is not very convenient for human perception. There are nice visualisation solutions for this kind of problems like radar (spider) charts or parallel coordinates plot, but still it seems like getting too much into details. The answers to the detailed questions are directly a result of parties general attitude: conservative or liberal, pro/anti European integration, more social or free market economy approach and so on.

The aim of this part is to identify those general directions that diversifies political parties and visualise them. For the the Principal Component Analysis will be used. Let us look at the answers again: questions 2, 7 and 20 have all same (all positive) answers. So the matters of moving headquater of central offices outside the Warsaw, teaching more practical skills at schools and maintaining sanctions on Russia are a common ground for all parties. Nice that they agree (or they know that it is how the voters see it) on something. We will need to focus on other questions to spot the differences between parties.

To perform PCA analysis we will use prcomp function from base R:

# remove the Party name and the not useful questions
ANSWERS_PCA <- ANSWERS[,-c(1, 3, 8, 21)]
# prepare the PCA model
model_pca <- prcomp(ANSWERS_PCA)
# check the results
summary(model_pca)
## Importance of components:
##                           PC1    PC2    PC3    PC4       PC5
## Standard deviation     2.9271 2.0622 1.4261 0.8636 4.592e-16
## Proportion of Variance 0.5492 0.2726 0.1304 0.0478 0.000e+00
## Cumulative Proportion  0.5492 0.8218 0.9522 1.0000 1.000e+00

The key decision in PCA analysis is how many components to use. The rule of thumb is: as many as the increase of variance explained is significant (it gets smaller with each component i.e. first component is this that explains the most of the variance and so on). Usually two or three is enough, as the main reason to perform this operation is to reduce the number of variables. Here we can see that two of components explain over 82% of the variance, which would be enough. But hey, it will be much more fun to present the political views in the 3D space! So lets add the third one as well.

And now comes the crucial part: understanding what the components are and naming them. This is some kind of art in this whole process thus the answers might depend on the analytic who is performing the PCA. The process is based on the magnitude and the direction of each question influence on the component. The results are as follows:

model_pca$rotation[,1:3] %>% round(., 4)
##                PC1     PC2     PC3
## question01 -0.1745 -0.3234 -0.1790
## question03 -0.1745 -0.3234 -0.1790
## question04  0.0838 -0.1069 -0.2118
## question05 -0.0872 -0.1617 -0.0895
## question06  0.0014 -0.4991  0.0597
## question08  0.3410 -0.1146  0.2279
## question09  0.1676 -0.2139 -0.4235
## question10  0.2582  0.2164 -0.0327
## question11  0.3410 -0.1146  0.2279
## question12 -0.1759  0.1758 -0.2387
## question13 -0.3410  0.1146 -0.2279
## question14  0.3420  0.1095 -0.2445
## question15  0.2541  0.1974 -0.3638
## question16  0.3420  0.1095 -0.2445
## question17 -0.3410  0.1146 -0.2279
## question18  0.0840 -0.4685  0.0543
## question19 -0.1676  0.2139  0.4235

Some pairs of questions have exactly same results for all components. Coincidence? I think not! They just have the same combination of answers, which is a result of having only five observations in the dataset. Well, it is easier to think which party definitely do not deserve your vote and which do not deserve it a little bit less.

There is a helper plot provided by the package ggbiplot that helps to understand the questions influence. Let us see the visualisations:

par(mfrow=c(1,2))
ggplotly(ggbiplot(model_pca, labels = ANSWERS$Party, choices = c(1,2)))
ggplotly(ggbiplot(model_pca, labels = ANSWERS$Party, choices = c(1,3)))
par(mfrow=c(1,1))